The Dimensionality of Language

نویسنده

  • Isidoros Doxas
چکیده

The dimensionality of the paragraph space of five corpora of different languages (English, French, modern Greek, Homeric Greek and German), genres (fiction and non-fiction) and intended audiences (children, adolescents and adults) is investigated. Term by paragraph occurrence data is processed by whitening, and the correlation dimension is calculated. All five corpora exhibit a weave-like structure, where at short distances the correlation dimension is lower than at long distances. In each case, the lower range has dimensionality of approximately eight. The higher range varies from about twelve to about twenty eight. Control simulations in which word instances were permuted do not exhibit two separate dimensionalities, demonstrating that the effect is determined by specific word choice, rather than by the paragraph length or word frequency properties of the corpora. By the embedding theorem (Takens, 1981), these results imply that at the lower range the trajectory can be describe by between nine and seventeen ordinary differential equations, placing an important constraint on the way in which authors transition from idea to idea when constructing prose, which may be universal.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

2D Dimensionality Reduction Methods without Loss

In this paper, several two-dimensional extensions of principal component analysis (PCA) and linear discriminant analysis (LDA) techniques has been applied in a lossless dimensionality reduction framework, for face recognition application. In this framework, the benefits of dimensionality reduction were used to improve the performance of its predictive model, which was a support vector machine (...

متن کامل

Wiener Way to Dimensionality

This note introduces a new general conjecture correlating the dimensionality dT of an infinite lattice with N nodes to the asymptotic value of its Wiener Index W(N). In the limit of large N the general asymptotic behavior W(N)≈Ns is proposed, where the exponent s and dT are related by the conjectured formula s=2+1/dT allowing a new definition of dimensionality dW=(s-2)-1. Being related to the t...

متن کامل

A Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters

Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...

متن کامل

Location and dimensionality estimation of geological bodies using eigenvectors of "Computed Gravity Gradient Tensor"

One of the methodologies employed in gravimetry exploration is eigenvector analysis of Gravity Gradient Tensor (GGT) which yields a solution including an estimation of a causative body’s Center of Mass (COM), dimensionality and strike direction. The eigenvectors of GGT give very rewarding clues about COM and strike direction. Additionally, the relationships between its components provide a quan...

متن کامل

Impact of linear dimensionality reduction methods on the performance of anomaly detection algorithms in hyperspectral images

Anomaly Detection (AD) has recently become an important application of hyperspectral images analysis. The goal of these algorithms is to find the objects in the image scene which are anomalous in comparison to their surrounding background. One way to improve the performance and runtime of these algorithms is to use Dimensionality Reduction (DR) techniques. This paper evaluates the effect of thr...

متن کامل

Diagnosis of Diabetes Using an Intelligent Approach Based on Bi-Level Dimensionality Reduction and Classification Algorithms

Objective: Diabetes is one of the most common metabolic diseases. Earlier diagnosis of diabetes and treatment of hyperglycemia and related metabolic abnormalities is of vital importance. Diagnosis of diabetes via proper interpretation of the diabetes data is an important classification problem. Classification systems help the clinicians to predict the risk factors that cause the diabetes or pre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007